NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Faithful Group Shapley Value

Lee, Kiljae; Liu, Ziqi; Tang, Weijing; Zhang, Yuan (September 2025, The Thirty-ninth Annual Conference on Neural Information Processing Systems)

Free, publicly-accessible full text available September 19, 2026
Higher-order accurate two-sample network inference and network hashing

https://doi.org/10.1080/01621459.2025.2520459

Shao, Meijia; Xia, Dong; Zhang, Yuan; Wu, Qiong; Chen, Shuo (July 2025, Journal of the American Statistical Association)

Free, publicly-accessible full text available July 3, 2026
Electrochemistry of MXenes and their sustainable energy applications

https://doi.org/10.1557/s43581-025-00130-9

Thakur, Anupma; Zhang, Yuan; Gogotsi, Yury; Anasori, Babak (March 2025, MRS Energy & Sustainability)

Free, publicly-accessible full text available March 31, 2026
Leave-One-Out Stable Conformal Prediction

Lee, Kiljae; Zhang, Yuan (January 2025, The Thirteenth International Conference on Learning Representations)

Conformal prediction (CP) is an important tool for distribution-free predictive uncertainty quantification. Yet, a major challenge is to balance computational efficiency and prediction accuracy, particularly for multiple predictions. We propose Leave-One-Out Stable Conformal Prediction (LOO-StabCP), a novel method to speed up full conformal using algorithmic stability without sample splitting. By leveraging leave-one-out stability, our method is much faster in handling a large number of prediction requests compared to existing method RO-StabCP based on replace-one stability. We derived stability bounds for several popular machine learning tools: regularized loss minimization (RLM) and stochastic gradient descent (SGD), as well as kernel method, neural networks and bagging. Our method is theoretically justified and demonstrates superior numerical performance on synthetic and real-world data. We applied our method to a screening problem, where its effective exploitation of training data led to improved test power compared to state-of-the-art method based on split conformal.
more » « less
Free, publicly-accessible full text available January 22, 2026
U-Statistic Reduction: Higher-Order Accurate Risk Control and Statistical-Computational Trade-Off

https://doi.org/10.1080/01621459.2024.2448029

Shao, Meijia; Xia, Dong; Zhang, Yuan (January 2025, Journal of the American Statistical Association)

Full Text Available
A Physics-Informed Gaussian Mixture Neural Network to Extract Atomic Signals from Scanning Tunneling Microscope Images

https://doi.org/10.1109/URTC65039.2024.10937537

Li, Rockwell; Guenthner, Ryan; Zhang, Yuan (October 2024, IEEE)

Full Text Available
All-silicon active bound states in the continuum terahertz metamaterials

https://doi.org/10.1016/j.optlastec.2024.111176

Huang, Yuwei; Kaj, Kelson; Yang, Zhiwei; Alvarado, Erick; Man, Wenkuan; Zhang, Yuan; Ramaprasad, Varun; Averitt, Richard D; Zhang, Xin (December 2024, Optics & Laser Technology)

Full Text Available
A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research

https://doi.org/10.1038/s42256-025-01014-w

Zhang, Yuan; Sui, Xin; Pan, Feng; Yu, Kaixian; Li, Keqiao; Tian, Shubo; Erdengasileng, Arslan; Han, Qing; Wang, Wanjing; Wang, Jianan; et al (April 2025, Nature Machine Intelligence)

To address the rapid growth of scientific publications and data in biomedical research, knowledge graphs (KGs) have become a critical tool for integrating large volumes of heterogeneous data to enable efficient information retrieval and automated knowledge discovery. However, transforming unstructured scientific literature into KGs remains a significant challenge, with previous methods unable to achieve human-level accuracy. Here we used an information extraction pipeline that won first place in the LitCoin Natural Language Processing Challenge (2022) to construct a large-scale KG named iKraph using all PubMed abstracts. The extracted information matches human expert annotations and significantly exceeds the content of manually curated public databases. To enhance the KG’s comprehensiveness, we integrated relation data from 40 public databases and relation information inferred from high-throughput genomics data. This KG facilitates rigorous performance evaluation of automated knowledge discovery, which was infeasible in previous studies. We designed an interpretable, probabilistic-based inference method to identify indirect causal relations and applied it to real-time COVID-19 drug repurposing from March 2020 to May 2023. Our method identified around 1,200 candidate drugs in the first 4 months, with one-third of those discovered in the first 2 months later supported by clinical trials or PubMed publications. These outcomes are very challenging to attain through alternative approaches that lack a thorough understanding of the existing literature. A cloud-based platform (https://biokde.insilicom.com) was developed for academic users to access this rich structured data and associated tools.
more » « less
Free, publicly-accessible full text available April 1, 2026
In Situ Raman and Fourier Transform Infrared Spectroscopy Studies of MXene−Electrolyte Interfaces

https://doi.org/10.1021/acsnano.5c03810

Parker, Tetiana; Zhang, Yuan; Shevchuk, Kateryna; Zhang, Teng; Khokhar, Vikash; Kim, Young-Hwan; Kadagishvili, Givi; Bugallo, David; Tanwar, Manushree; Davis, Ben; et al (June 2025, ACS Nano)
A multivariate to multivariate approach for voxel‐wise genome‐wide association analysis

https://doi.org/10.1002/sim.10101

Wu, Qiong; Zhang, Yuan; Huang, Xiaoqi; Ma, Tianzhou; Hong, L Elliot; Kochunov, Peter; Chen, Shuo (August 2024, Statistics in Medicine)

The joint analysis of imaging‐genetics data facilitates the systematic investigation of genetic effects on brain structures and functions with spatial specificity. We focus on voxel‐wise genome‐wide association analysis, which may involve trillions of single nucleotide polymorphism (SNP)‐voxel pairs. We attempt to identify underlying organized association patterns of SNP‐voxel pairs and understand the polygenic and pleiotropic networks on brain imaging traits. We propose abi‐cliquegraph structure (ie, a set of SNPs highly correlated with a cluster of voxels) for the systematic association pattern. Next, we develop computational strategies to detect latent SNP‐voxelbi‐cliquesand an inference model for statistical testing. We further provide theoretical results to guarantee the accuracy of our computational algorithms and statistical inference. We validate our method by extensive simulation studies, and then apply it to the whole genome genetic and voxel‐level white matter integrity data collected from 1052 participants of the human connectome project. The results demonstrate multiple genetic loci influencing white matter integrity measures on splenium and genu of the corpus callosum.
more » « less
Full Text Available

« Prev Next »

Search for: All records